105 research outputs found
Multi-Agent Simulation of Emergence of Schwa Deletion Pattern in Hindi
Recently, there has been a revival of interest in multi-agent simulation techniques for exploring the nature of language change. However, a lack of appropriate validation of simulation experiments against real language data often calls into question the general applicability of these methods in modeling realistic language change. We try to address this issue here by making an attempt to model the phenomenon of schwa deletion in Hindi through a multi-agent simulation framework. The pattern of Hindi schwa deletion and its diachronic nature are well studied, not only out of general linguistic inquiry, but also to facilitate Hindi grapheme-to-phoneme conversion, which is a preprocessing step to text-to-speech synthesis. We show that under certain conditions, the schwa deletion pattern observed in modern Hindi emerges in the system from an initial state of no deletion. The simulation framework described in this work can be extended to model other phonological changes as well.Language Change, Linguistic Agent, Language Game, Multi-Agent Simulation, Schwa Deletion
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI
Natural Language Inference (NLI) is considered a representative task to test
natural language understanding (NLU). In this work, we propose an extensible
framework to collectively yet categorically test diverse Logical reasoning
capabilities required for NLI (and by extension, NLU). Motivated by behavioral
testing, we create a semi-synthetic large test-bench (363 templates, 363k
examples) and an associated framework that offers following utilities: 1)
individually test and analyze reasoning capabilities along 17 reasoning
dimensions (including pragmatic reasoning), 2) design experiments to study
cross-capability information content (leave one out or bring one in); and 3)
the synthetic nature enable us to control for artifacts and biases. The
inherited power of automated test case instantiation from free-form natural
language templates (using CheckList), and a well-defined taxonomy of
capabilities enable us to extend to (cognitively) harder test cases while
varying the complexity of natural language. Through our analysis of
state-of-the-art NLI systems, we observe that our benchmark is indeed hard (and
non-trivial even with training on additional resources). Some capabilities
stand out as harder. Further fine-grained analysis and fine-tuning experiments
reveal more insights about these capabilities and the models -- supporting and
extending previous observations. Towards the end we also perform an user-study,
to investigate whether behavioral information can be utilised to generalize
much better for some models compared to others.Comment: arXiv admin note: substantial text overlap with arXiv:2107.0722
LLM-powered Data Augmentation for Enhanced Cross-lingual Performance
This paper explores the potential of leveraging Large Language Models (LLMs)
for data augmentation in multilingual commonsense reasoning datasets where the
available training data is extremely limited. To achieve this, we utilise
several LLMs, namely Dolly-v2, StableVicuna, ChatGPT, and GPT-4, to augment
three datasets: XCOPA, XWinograd, and XStoryCloze. Subsequently, we evaluate
the effectiveness of fine-tuning smaller multilingual models, mBERT and XLMR,
using the synthesised data. We compare the performance of training with data
generated in English and target languages, as well as translated
English-generated data, revealing the overall advantages of incorporating data
generated by LLMs, e.g. a notable 13.4 accuracy score improvement for the best
case. Furthermore, we conduct a human evaluation by asking native speakers to
assess the naturalness and logical coherence of the generated examples across
different languages. The results of the evaluation indicate that LLMs such as
ChatGPT and GPT-4 excel at producing natural and coherent text in most
languages, however, they struggle to generate meaningful text in certain
languages like Tamil. We also observe that ChatGPT falls short in generating
plausible alternatives compared to the original dataset, whereas examples from
GPT-4 exhibit competitive logical consistency.Comment: EMNLP 2023 Main Conferenc
- β¦